The column labeled eruptions represents the duration of the eruption in minutes. The column labeled waiting represents the time until the next eruption in minutes.
We will start by exploring the waiting time for eruptions using a histogram. Histograms divide the data into a set number of groups (bins) and count up the number of observations (here, waiting times between eruptions). Depending on the number of bins, the resulting plot can take on different appearances, showing less (lower number of bins) or more (higher number of bins) detail.
The code below will create a histogram of waiting times with the data divided into 5 bins. Click the button “Run code”.
What pattern(s) do you observe? Are there times where counts seem to be high, indicating that there are more eruptions with that waiting time?
Try changing the code bins = 5 to bins = 10 by editing the code block above. Click the button “Run code” to remake the plot.
Try other values for bins: 20, 50, 100. Each time you edit the code, click the “Run code” button.
Was your hypothesis about the distribution of waiting times between eruptions correct? Did you find a number of bins to divide the data into that you think best represented the distribution of waiting times?
Footnotes
Data from Härdle, W. (1991). Smoothing Techniques with Implementation in S. New York: Springer.↩︎